Interconnection Networks for High-Performance Computing Electronics v. Optics

نویسندگان

Assaf Shacham

Benjamin A. Small

Keren Bergman

چکیده

Ultra-high capacity optical interconnection networks capable of supporting data rates of terabits per second per port, are considered as a potential solution to the immense processor-memory access communications bottleneck expected to dominate high-performance petaflop-scale supercomputers. All-optical switches have the potential to supply this next -generation technology with the performance characteristics necessary for efficient communications between supercomputer processor, memory and storage elements, even within systems containing thousands of such elements. Although the recent stagnation in the telecommunications industry has temporarily slowed the development of lightwave switching technologies, these systems still offer significant advantages over electronic switches, specifically in terms of bandwidth, scalability and latency. This brief compares electronic and optical switching technologies and posits that specifically designed all-optical switching architectures can provide substantial technological expediency, outperforming current and future electronic switches. Motivation Projections made regarding the future needs of supercomputing systems in the United States’ national security com munity, as well as in other applications have noted the “requirement for high bandwidth/low latency interconnects and switches, between logic, memory and storage” as “the most important performance metric” for supercomputer technological progress [1]. Supercomputers are usually divided into two classifications. The first one, known as type -T , is based on high-performance commodity processors and does not require a high level of interconnection between the processor and the memory elements. These systems are simpler to realize, since processor technology is mature and its development is constantly driven by the commercial high-end computing market. The drawback of the type -T systems is a strong reliance on multi-level caching and memory access locality, making them inefficient in applications presenting no significant memory address locality, (e.g. weather forecasting, encryption and decryption, and data mining). Systems of the second type, type-C, are based on fairly simple high-performance processors , specifically designed for supercomputing applications, and connected by a tightly coupled, high performance interconnection network. The interconnection network at the core of a type -C supercomputer is a high bandwidth, low latency switching fabric with thousands or even tens of thousands of ports to accommodate processors, caches, memory elements and storage devices. Low message latency and high throughput under heavy traffic loads are the key metrics of the core switching fabric required to achieve high performance in type-C supercomputers. Thus, an effective solution for the interconnection network directly addresses some of the most important computing challenges of the future. Electronic Switching Electronic technology continues to make huge progress strides in realizing high speed signaling and switching. Data rate of 10 Gbps ha s been realized on electronic serial transmission lines for distances in the order of tens of centimeters on printed circuits boards and as long as 5 meters on shielded differential cables [2]. Complementary metal oxide semic onductor (CMOS) technology enabled development of application-specific integrated circuits (ASIC) with increasing speed and decreasing die area. These ASICs, used as switching fabrics, memory elements, and arbitration and scheduling managers have shown increasing performance over a long period of time. Moore’s Law has been dominating the entire semiconductor industry for more than 30 years; it states that the minimum pitch size of an integrated circuit will shrink by a factor of 0.7 every 18 months, and that its speed and die area are thus expected to decrease accordingly [3]. Semiconductor companies are currently shipping ASICs containing 144×144 fabrics of 3.6 Gbps per port (Vitesse, VSC3140) [4] and single stage fabrics of 16×16 ports with 2.5 Gbps per port (Agere, PI40SAX). Agere further claims to have begun production of a multistage fabric (PI40) that scales up to 2.5 Tbps [5]. The next generation semiconductor processes, along with device parallelism and implementation of multistage architectures, such as Clos’ [6], can increase the total switching capacity of a multistage electronic switch to a 128×128 matrix of 40 Gbps ports [7]. One must note, however, that the extensive use of multi-stage architectures has an important penalty in terms of increased latency, as well other drawbacks in terms of design complexity, packaging, and power requirements. This remarkable progress is expected to slow down in the near future. The International Technology Roadmap for Semiconductors (ITRS) 2003 report projects that Moore’s Law is expected to slow down significantly in the coming decade, reaching 18 nm processes by 2018 [8]. These projections indicate an effective 50% deceleration in Moore’s Law. It is likely that this slowdown will increase in the future forcing the semiconductor industry to search for new technologies to address the ever-increasing traffic rates. Other electronic issues such as pin count, packaging density and transmission lines limitations also present a growing challenge to the design of high rate electronic switches There are three commonly used architectures in electronic switching fabrics: arbitrated crossbars, buffered crossbars and shared memory switches. Each of these architectures has particular advantages and disadvantages, and each of them has a limiting factor that will impede and perhaps cease future progress. Arbitrated crossbars, although simple and power saving, require a very complex scheduler. The iSlip scheduling algorithm, first developed by the Tiny-tera project in Stanford University, and a de-facto industry standard, requires O(N) logic blocks and multiple time consuming computational iterations to find an input–output matching for an N×N crossbar [9]. With the increasing rates and number of ports per switch, this arbitration becomes too complex a task in the given latency and implementation constraints. Quality of Service (QoS) standards, which require more complex algorithms, guaranteeing priorities and latencies, present even greater challenges. The request–grant communication times between the nodes and general scheduler, growing longer with respect to the cell size as the rate grows, cause an increase in the buffering delay of traffic and thus an increase in the overall system latency to the microsecond range. Buffered crossbars, designed to bypass the complexity problem by assigning a dedicated FIFO memory block per each input-output path, indeed lower the complexity but require a vast amount of memory, O(N·rate), and resulting in an unmanageable scaling of power consumption and of cost. Shared memory architectures are using a single memory block, perhaps multi-banked, with time division multiplexed (TDM) access for all read and write accesses. This approach trades lighter memory size requirements with very high memory bandwidth. The switch bandwidth, including speedup required to compensate for inherent inefficiencies, is limited to just below half the memory bandwidth. This fact strongly couples this architecture’s performance with memory technology progress, a fact that limits its future development. The usage of multiple parallel memory blocks is also limited by pin the count and density. The latency of state-of-the -art shared memory devices is not specified as it depends on the traffic load and other parameters, but it is estimated at around 1 μs. The spatial expansion of large port-count systems presents another challenge to electronic systems. The transmission of electrical signals , at high rates, across long transmission lines, connecting the nodes to the switching fabric (order of meters to a hundred meters) is impossible with today’s technology. This problem is overcome by multiple internal E/O/E conversions to enable optical trans port. An excessive number of such conversions between optical and electronic signal and vice versa, presents power and cost issues that may be prohibitive. Generally, electronic switching technology, well-proven and mature, can certainly meet the needs of today’s Internet traffic and probably that of the next generation as well. However, the exponential growth of traffic needs in the far future exceeds the projected progress in electronic technology. The latency performance of these devices, even in the present, does not adequately meet the demands of the high-performance supercomputing industry. This paper claims that optical switching has the potential to address the shortcomings of electronic switching, especially in the long term. Optical switching latency is shorter by an order of magnitude, since the switching is done while the data propagates at the speed of light without any buffering or storage . Optical transmission rates are higher by orders of magnitude and are not limited by the spatial size of the switch. Lightwave Devices Before developing an all-optical interconnection network architecture, it is first absolutely necessary to understand the components available for the implementation of such a system. Nearly all lightwave device technologies differ in many fundamental ways from the conventional electronics found in contemporary packet switches. Fiber optic systems have traditionally contained simplistic optical amplifiers (e.g., erbium-doped fiber amplifiers, Raman amplifiers, semiconductor optical amplifiers), grating filters, optical couplers, and polarization controllers. These components have completely different system-level functionality than any of the electronic components found in conventional switches. However, well-planned lightwave system designs can yield similar behavior by combining the fiber optic components creatively and intelligently. For example, couplers and amplifiers arranged correctly behave in a manner analogous to an electronic fan-out; and amplifiers with sufficient switching ratios can be made to realize a simplistic switch. Additionaly, the nature of optical communication allows for even more flexibility than electronics. Dense wavelength division multiplexing (DWDM) is perhaps the greatest benefit of lightwave systems enabling multiple wavelength channels to significantly enhance the transmission capacity of a single fiber. Using this technique, in a recent extraordinary experiment, transmission rates of 40 Gbps per single channel have been realized in a system with 273 wavelengths on a single link [10], yielding an aggregate transmission rate of more than 10 Tbps. An architecture that could fully utilize this bandwidth with substantial throughput under heavy traffic would present the ultimate “infinite bandwidth” switch. Moreover, a well-designed lightwave interconnection network architecture would include direct optical pathways from the input ports to the destination ports, allowing information to travel at the speed of light throughout the entire switching fabric. This completely realistic sort of transmission scheme allows for latencies to be limited predominantly by time-of-flight considerations, which certainly can be reduced to 100s ns in large scale (10k ports) fabrics. On the other hand, optical switching technology presents some challenges which are addressed very well by electronic technology. Conventional processor and memory units have been used for decades in electronic switching fabrics with very high speed and performance. In contrast, optical signals cannot be buffered efficiently for an arbitrary time, and optical data processing cannot be done as efficiently as electronic processing. Perhaps the biggest concern for a welldesigned system is the introduction of signal noise by the optical components. Electronics easily perform signal reshaping and regeneration, whereas the all-optical solution, with optical signal amplification, necessarily introduces fundamental (amplified spontaneous emission) noise. Careful system planning can minimize these signal distortions: high-extinction ratio devices can be employed, and optical isolators can mitigate the propagation of some of the noise. Optical Switching Architectures The aforementioned differences between the electronic and optical implementations require the development of novel architectures specifically designed to best suit optical packet switching, rather than imposing traditional electronic architectures on a contemporary lightwave system. Such architectures could perhaps utilize self-routing packets, alleviating the need for central scheduling and processing, as well as deflection routing as a means of contention resolution in the place of packet buffering. The literature presents some practical large-scale optical packet switches [11], most of which share the semiconductor optical amplifier (SOA) as the central active optical component. The benefits of the SOA-based switching elements are well-understood: the potential for very highspeed switching, high extinction ratios, sizable operating bandwidth, and reasonably small noise introduction [12-13]. Furthermore, the fabrication of multiple SOA devices on a single substrate [14] demonstrates these devices’ potential for high-density integration and should eventually decrease the prices of such components. NEC and Alcatel have presented in [7] and [15], very similar optical switches, both based on a broadcast-and-select crossbar matrix, in which the selection is done by SOA gates. Despite having attractive latency and noise characteristics, these systems have fundamental problems. The implementation of a simplistic crossbar switch would require N2 SOA switching gates for an N×N fabric , leading to high cost and problems with scalability. Both systems apply stacking of channels in the wavelength domain to reduce the number of SOA gates required to the order of N; however, this wavelength stacking limits the ultimate number of channels in the C-band reserved for user data, and fails to exploit the full bandwidth potential of DWDM transmission. Another solution suggested by Chiaro Networks [16] uses the proprietary Optical Phased Array (OPA) technology in a novel switching fabric. This switching fabric is fundamentally a crossbar system, but GaAs based electro-optic devices have been used as the switching elements instead of SOAs. These systems are examples of conventional electronic architectures implemented with lightwave devices, consequently losing most of the optical advantages. The usage of a crossbar matrix topology (or the OPA for that matter) requires both complex scheduling and intensive buffering and queue management. These tasks, which cannot be performed entirely in the optical domain, must be executed by ASIC schedulers and DRAM buffers, yielding what is really an amalgam switching technology, and not a true all-optical switching fabric. Only a system with a different architecture, one that truly utilizes the benefits of optical switching, can make the performance leap and present significant advantages over electronic interconnection networks. Data Vortex The Data Vortex, a novel all-optical packet interconnection network currently being developed at the Columbia University’s Lightwave Research Laboratory, presents an original architecture, designed specifically to take advantage of the benefits of optical technologies. A detailed discussion of the operation, functionality, and behavior of the Data Vortex is beyond the scope of this document, but is available in published literature [17-19]. The Data Vortex can be envisioned as a collection of concentric cylinders, each containing a net of interconnected switching nodes. Packets are made to propagate from the outermost cylinder towards the innermost cylinder, and deflections are made within the individual cylinders. Each switching node contains two input ports and two output ports. The current implementation therefore requires two SOA gates and a series of optical coupling elements. The switching nodes are also responsible for the routing decision, based upon information coded on particular DWDM channels and upon signals transmitted between adjacent nodes in order to avoid packet collision; very little control logic is required [17]. The novel architecture of the Data Vortex, aside from solving the processing and buffering problems which would normally plague sophisticated all-optical sw itching fabrics, presents numerous attractive features, especially for supercomputing applications. Port count and scalability. The Data Vortex scales very well, requiring approximately O(N ln N) switching elements for an N×N-port system. The primary concern for the size of this switching topology relates to the signal distortion: Through how many nodes can a packet travel and still maintain a feasible bit error rate? Recent simulations have concluded that a moderately loaded 8192×8192 Data Vortex switch requires fewer than 45 node “hops” for 99.99% of the injected packets [20]. Recent test-bed experimentation has shown that a bit error rate of better than 10 can still be preserved with this number of node hops [21-22]. These figures will only improve with future device generations. Bandwidth. The Data Vortex leaves the wavelength domain to the full exploitation of the user. Using 40 Gbps data rates per channel and 100 DWDM channels per fiber , the bandwidth of each link within the Data Vortex could reach 4 Tbps. A system with tens of thousands of ports could achieve hundreds of Petabytes in aggregated bi-sectional capacity. Latency. The latency of a certain packet in the Data Vortex depends on the path that it takes in the switch. This path is not deterministic, but analysis and simulations have shown that the deviation of the node hop count distribution is fairly reasonable [20-22]. Utilizing state-of-the-art optical and electronic components, this latency figure can be reduced to100s ns in large scale (>10k ports) systems. Simulations of a heavily loaded (randomly distributed traffic) 32k×32k Data Vortex have shown a median hop count per packet as low as 28 hops, with less than 0.01% of the packets experiencing a hop count larger than 50. A reasonable hop delay of 5 ns achieved with current off-the-shelf components, yields 140 ns median latency and 240 ns maximum latency figures for the 100 petabit/second bisectional bandwidth system. With some optical component integration, these latency figures could be realistically reduced by a factor of 5. Research is being done currently , by simulations and analytical means, to optimize the packet injection methods in order reduce the latency per given bandwidth and to reduce the variance of the latency distribution. These features make the Data Vortex a very good candidate for the interconnection network backbone in a type -C supercomputer. Its scalability to thousands or tens of thousands of ports enables connections to as many processor or memory nodes as required, and its low latency allows for efficient computing with a high degree of parallelism. The Data Vortex’s currently realized packet structure requires destination addresses to be encoded on specifically designated wavelength channels; very specific timing and synchronization requirements must also be met. These characteristics present somewhat of a challenge to the adaptation of the Data Vortex for IP networks (although this challenge can be addressed with the design of application-specific input interfaces to perform the requisite packet encoding and injection timing). A type -C supercomputer which includes the Data Vortex could be programmed to natively generate the correct packet structure, eliminating the need for intermediate interfaces. Technology and Price Comparison Future developments in optical and optoelectronic component technology will further reduce the latency of each node hop and further improve the signal degradation at each switching node. Simultaneously, faster electronic control logic will aid this endeavor. It is even envisioned that high-speed all-optical data processing, as in [23], could be employed for the routing logic in the place of the current electronic technology. Up until now, the effects of the marketplace have been removed from the evaluation of switching architectures. Indeed, for most customers, these aspects are of great concern. During the fiscal year 2003, prices for optical components were certainly significant, with SOAs costing as much as $1000 (USD) for bulk quantities, and discrete coupler and filter components in the neighborhood of $250 each. Another significant cost comes from the components required to convert the optical signals into electrical ones; 10 Gbps optoelectronic devices can approach $1000. Compared to the bandwidth supplied, however, lightwave paths are more reasonably priced at less than $100 per Gbps. Although these optical components are currently vastly more expensive than electronic components, they have been decreasing in cost over the years due to improvements in fabrication and increased supply. It is quite likely that the end of this decade will see the prices of optical components on par with that of high-speed electronic components. Electronic switching devices , on the other hand, are expected to continue following Moore’s law in cost as well as in performance. Current switching fabric integrated circuits are shipped at a cost of $500 to $1200 per switching fabric of varying rates. These numbers represent a cost of about $4 to $7 per Gbps of traffic rate. These costs have been quite stable during the last few years, due to the economic downturn and are expected to decrease with Moore’s law in the future years. The aforementioned prediction regarding the slow down or even halt of Moore’s law applies to the decrease in integrated circuit costs as well. As the semiconductor technology is mature and its characteristics are well-known, no major breakthroughs, leading to significant cost-reductions are expected. Figure 1 illustrates the projected cost per bandwidth of the Data Vortex and electronic switching fabrics. The assumptions upon which it is based are the following: 1. The costs of optical devices, namely SOAs, filters and couplers will decline significantly due to improvement of the fabrication and fiber-coupling processes. 2. Thousands of ports will be available with future implementations of the Data Vortex; increased bit-rates and channel density will also be possible. 3. The electronic technology will continue to scale down with Moore’s law, decelerated by a factor of 2 (as discussed above).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Performance Analysis of a New Neural Network for Routing in Mesh Interconnection Networks

Routing is one of the basic parts of a message passing multiprocessor system. The routing procedure has a great impact on the efficiency of a system. Neural algorithms that are currently in use for computer networks require a large number of neurons. If a specific topology of a multiprocessor network is considered, the number of neurons can be reduced. In this paper a new recurrent neural ne...

متن کامل

Performance Analysis of a New Neural Network for Routing in Mesh Interconnection Networks

متن کامل

Design of scalable optical interconnection networks for massively parallel computers

The increased amount of data handled by current information systems, coupled with the ever growing need for more processing functionality and system throughput is putting stringent demands on communication bandwidths and processing speeds. While the progress in designing high-speed processing elements has progressed significantly, the progress on designing high-performance interconnection netwo...

متن کامل

Optical Interconnections in Parallel Radar Signal Processing Systems

Optical interconnection networks is a promising design alternative for future parallel computer systems. Numerous configurations with different degrees of optics, optoelectronics, and electronics have been proposed. In this paper, some of these interconnection networks and technologies are briefly surveyed. Also, a discussion of their suitability in radar signal processing systems is provided, ...

متن کامل

Reduction of Energy Consumption in Mobile Cloud Computing by ‎Classification of Demands and Executing in Different Data Centers

In recent years, mobile networks have faced with the increase of traffic demand. By emerging mobile applications and cloud computing, Mobile Cloud Computing (MCC) has been introduced. In this research, we focus on the 4th and 5th generation of mobile networks. Data Centers (DCs) are connected to each other by high-speed links in order to minimize delay and energy consumption. By considering a ...

متن کامل

Optical Centralized Shared Bus Architecture for High-Performance Multiprocessing Systems

With the increasing demand for solving more complex problems, high-performance multiprocessing systems are attracting more and more research efforts. One of the challenges is to effectively support the communications among the processes running in parallel on the multiprocessors. Due to the physical limitations of electrical interconnects, interconnection networks impose a potential bottleneck ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2004

Interconnection Networks for High-Performance Computing Electronics v. Optics

نویسندگان

چکیده

منابع مشابه

Performance Analysis of a New Neural Network for Routing in Mesh Interconnection Networks

Performance Analysis of a New Neural Network for Routing in Mesh Interconnection Networks

Design of scalable optical interconnection networks for massively parallel computers

Optical Interconnections in Parallel Radar Signal Processing Systems

Reduction of Energy Consumption in Mobile Cloud Computing by ‎Classification of Demands and Executing in Different Data Centers

Optical Centralized Shared Bus Architecture for High-Performance Multiprocessing Systems

عنوان ژورنال:

اشتراک گذاری